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Post Kalman Progress 

David Sonnabend* 

Abstract 

In a paper here last year, [2], an idea was put forward that much greater performance could 
be obtained from an observer, relative to a Kalman filter, if more general performance indices were 
adopted, and the full power spectra of all the noises were employed. Considerable progress since 
then is reported here. Included are an extension of the theory to regulators, direct calculation of the 
theory’s fundamental quantities — the noise effect integrals — for several theoretical spectra, and 
direct derivations of the Riccati equations of LQG and Kalman theory yielding new insights. 

1 Notation 

Uppercase bold roman letters are 2 dimensional arrays; e.g., F. Lowercase bold roman or greek letters 
are column vectors; e.g., x. Lowercase greek subscripts are indices. Overdots signify time derivatives; 
e.g., x = dxjdt. A T superscript denotes transpose. Overbars signify mean values; e.g., w. Underbars 
denote random processes with the bias, if any, removed; e.g., w(t) = w(t) - w. Hats indicate estimates; 
e.g., x. Sines and cosines are denoted by s and c respectively. 

B = n x w process noise state distribution matrix 

c t = state or estimation error settling time concern level 

C = n x n matrix of white noise Lyapunov constraints 

F — n x n plant matrix 

g(u) = general controls distribution function 

G = n x u controls distribution matrix 

h(t) ~~ impulse response function 

H= z x n measurement partials matrix 

7i ~ variational Hamiltonian 

I — n x n identity matrix 

J ~ overall performance index 

J t — settling time performance index 

K — n x z observer feedback gain matrix 

L = u x n regulator feedback gain matrix 

m = order of a Butterworth filter or noise source 

M= n x n matrix used in the calculation of P x 

n = number of elements in the state vector x(t) 

N = solution of Lyapunov equation 
P x = n x n covariance matrix of x(t) 

P c = n x n covariance matrix of c(£) 

Q(u>) — n x n combined noise matrix 
R w (0) — average power of w (t) 

9ft(x) — real part of x 

S v (o;) = v x v power spectral density of v(£) 

&w(v) = w x w power spectral density of w(*) 

t = time, or more generally, independent variable in state equations 
t 3 = settling time of regulator or observer 
Tr[P) — trace of P 

u — number of elements in the control vector u(*) 
u(t) = u element vector of controls 
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U = u x u controls weighting matrix 

v = number of elements in measurement noise vector v(t) 
v(t) = v element measurement noise vector 

V = 2 x t) measurement noise distribution matrix 

w = number of elements in the process noise vector w(t) 
w(t) = w element process noise vector 
W — z x w process noise cross distribution matrix 
x(t) = n element state vector 
X = n x n state weighting matrix 

Y = n x w observer process noise effect matrix 

z = number of elements in the measurement vector z (i) 
z (t) = z element measurement vector 
Z = n x n system matrix 
0 = zero vector or matrix 

T = z x z combined measurement noise matrix 
Sj, = Kronecker delta (=1 if j = k; =0 otherwise) 

S(t) = z element measurement residuals vector 
e(£) = n element estimation error vector 
© = n x n combined weighting matrix 
A = eigenvalue of Z 

A — nxn matrix of Lagrange multipliers 

2 = n x n estimate error weighting matrix 

$ = n x n noise effect integral 

(j — angular frequency 

u) c — break frequency of noise spectrum 

— half power frequency of noise spectrum 


2 Regulator and Observer Structure 

Throughout this paper, I’ll be dealing with systems specified by an n element state vector x(t), obeying 
a set of 1st order ordinary differential equations. I’ll assume that, after some suitable linearization, these 
may be written: 

x(t) = Fx(t) + g[u(t)| + Bw(() (1) 

Here, u(t) is a u element control vector, and w(t) isaw element process noise vector. Each element 
Wj(t) is regarded as stationary, and described by the power spectral density S W j(u>), where ui is angular 
frequency. Also, F is the “plant” matrix, and W is the process noise distribution matrix, both regarded 
as independent of time. While some Wj{t) might affect more than 1 state equation, W is constructed 
so that all the uij(t) are statistically independent. Finally, the possibly nonlinear g[u(t)] expresses the 
effect of the controls on the state. 

In a linear proportional regulator, where the intent is to hold x(t) close to zero, in spite of w(t), we take 

g(u(t)l = Gu(i) ( I 2 ) 

for some fixed n x u matrix G; and then let 

u(t) = -Lx(t) (3) 

for some fixed u x n matrix L. When these relations are substituted into (1) there follows: 

x(t) = Zx(t) + Bw(t) (4) 


where 


Z = F — GL 


(5) 


I shall refer to the n x n matrix Z as the regulator system matrix. It will reappear in another guise 

in observer theory below. The next section will deal with new methods for choosing the feedback gain 
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matrix L, and how to calculate the performance. As a 1st application, Linear Quadratic Gaussian theory 
(LQG) is derivable from this more general theory. A sketch of the proof is given in Section 5. 

Turning now to observers, suppose a system is described by the state equations (1). We wish to determine 
the current value of x(t) by the use of these, supported by some set of measurements z (t). Suppose further, 
that after some suitable linearization, these measurements are described by the model 

z(t) = zb + Hx(£) T Vv(t) + Ww(£) (6) 

a z element vector. The 1st term on the right is the assumed known bias in the measurements, partly due 
to undesired offsets in the instrument, and partly from the linearization. Next, the assumed fixed z x n 
matrix H comes from the linearization, and is known in estimation lingo as the “measurement partials”. 

The measurements are assumed to be contaminated by some set of v statistically stationary noises v(£). 
Like the process noise w (£), a fixed z x v distribution matrix V is introduced to insure that all the 
are independent. As it sometimes happens that measurements are also contaminated by the process noise, 
I have included such a term, with an appropriate distribution matrix W. It was required in [2]; however, 
in most practical cases the term can be omitted; still, its presence leads to an interesting generalization. 

An observer based on these plant and measurement models starts with an estimate x(£) of x(£). This is 
generated by a computer simulation of the deterministic parts of the plant equations (1), corrected as 
follows by the measurements: 


x(£) = Fx(t) + g[u(£)] 4- Bw 4- K6(t) (7) 

where what estimation types like to call the “residuals” are defined by: 

6{t) = z (t) - z B - Hx(t) - Vv - Ww (8) 

that is, the difference between the actual measurements z(t) and their reconstruction in the computer. 
Here, the biases w and v in the noises are assumed known. The nx z feedback gain matrix K is named for 
Kalman; but in the more general theory in Section 4, it's not derived with the Kalman filter assumptions. 

On introducing the error in the estimate: 


e(t) = x(t) - x(t) (9) 

the residuals (8) may be rewritten as 

6(t) = Vv(£) + Ww (t) - H €(t) (10) 

When this is substituted into (7), and the plant equations (1) are subtracted, there results: 

e(t) = Z c(t) + KVv(i) + Yw (t) (11) 

in which 

Z = F — KH ; Y = KW-B (12) 

I will call the n x n matrix Z the observer system matrix, in order to stress the similarity of (11) to 
the regulator behavior (4). Indeed, one may regard this observer as a regulator, whose intent is to force 
the observation error c(£) close to zero, in spite of all the noises. Observer performance when subject to 
arbitrary noise is discussed in Section 6; and the specialization to the now obsolete Kalman theory in 
Section 7. 

A few observations. In either of these systems, it must be possible to choose the feedback gain (L or 
K) such that Z < 0 (negative definite). If this isn’t possible, then either (4) or (11) will diverge, and 
the system is said to be uncontrollable or unobservable. In what follows, I’ll always assume that such a 
choice is possible. 
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The theory presented here got started about 5 years ago, when W. M. McEneaney, in unpublished 
notes, demonstrated that the terminal covariance of c(t) in a Kalman filter could be calculated directly, 
if everything was stationary, without integrating any differential equations. The idea was extended to 
regulators, and to arbitrary noise power spectra, in [1]. Several other papers on the subject have been 
written, culminating in last year's paper here, [2]. A book [3], examining the subject in much greater 
depth, and containing all the proofs, is now nearing completion. 

3 Regulator Performance 

In the above regulator, with statistically stationary noise, initial transients will die out, and the statistics 
of x(£) will tend to asymptotic values. Of these, the mean and the covariance are the most important. 
I’ll sketch the results to date of the new theory; but page limits prevent my giving the proofs, or much 
discussion. Following that, I’ll discuss a more general performance index than is usually seen in regulator 
design. 

Starting with the mean, if expectation is applied to (4), then after settling: 

x = — Z -1 Bw (13) 

where, since Z < 0, it's non-singular. Thus, x(f) has a bias if and only if w (t) has one. 

As for the covariance, since (4) is linear, it has a solution for x(t) in terms of an integral over w(t). From 
this, the outer product of x(t) with itself may be constructed as a double integral, and the expectation 
applied, leading by and by to an expression for the terminal covariance P x of x(t), in terms of a double 
integral over the autocovariance of w (t). On applying a Fourier transformation, the expression is con- 
verted to the frequency domain, and after working through another page of dense algebra, this general 
result emerges: 

,oo _ 

P x = j ^(Z + u7 2 Z -1 ) -1 N(u>) + N(w) (Z T + w 2 Z^ 7 ') - j du> (14) 

where N(cj) is the solution of the Lyapunov equation 

ZN(u) + N(u>)Z r = BS,„(u;)B t = Q(u>) (15) 

Here, S w (u>) is a diagonal matrix, whose non-zero element S _ WJJ (^) is the one sided power spectral density 
of u^(£). Also, the normalization of the Fourier transform is such that the average power in w_j{t) is 



Thus, for a given gain L, Z is calculated from (5). Then N(tt>) is determined by by solving (15) for each 
of a dense set of u values; after which P x is obtained by the numerical integration of (14). Tedious, but 
at least all the S_ W jj(u) vanish above some finite u;, for any practical spectra. 

This was the status of the theory in the earlier papers. Since then, a dramatic improvement in this 
procedure has been found. By construction from (14) it is easy to show that P x obeys its own Lyapunov 


equation: 

ZP X + P X Z T = m + m t 

(17) 

where 

w 



M= E^BjBJ 

(18) 


j=i 

Here, B j is the j th column of B, and 


*j= / (Z + w 2 Z _1 ) _1 5 (o;)da» (19) 

Jo 

I have called these quantities the noise effect integrals. The current progress in determining these for 
several theoretical spectra is given in Sections 4 and 8. Note that, while numerical integration may still 
be needed to find some <1^, there is now only one Lyapunov equation to solve to get P x . 
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In modern control theory, as applied to regulators, it’s common to measure performance by a linear- 
quadratic index as follows: 

J = Tr |(X + L t UL)(P x 4- xx r )] (20) 

Here, X is a weighting matrix, intended to express dislike for each of the elements xj(t). In a practical 
weighting scheme we require both X > 0, and that it be symmetric. In “Bryson weighting” , X is diagonal, 
and each Xjj is the inverse square of the level of Xj(t) that would cause a given amount of pain. Similarly, 
U is a weighting matrix, intended to express dislike for the use of controls. If it’s obtained by Bryson 
weighting, the diagonal element U kk is the inverse square of the level of u k (t) that would cause the same 
amount of pain. Note that J is dimensionless, if it’s constructed in this way. Overall, if you believe that 
this J truly expresses your desires in the design of your regulator, then it only remains to find that value 
of the gain L that yields minimum J. I’ll not get into the derivation of (20), as it’s given in most books 
on the subject. 

While most theoretical work tends to rely on some variation of (20), there are other issues the designer 
must face. Perhaps most important is settling time; i.e., the time for the regulator to recover from 
arbitrary initial conditions, or unmodeled disturbances. In the system (4), settling consists of the behavior 
of n modes, each of which settles exponentially according to the eigenvalues of Z. More precisely, if some 
eigenvalue is A = o + ip, then the settling time of the corresponding mode is —l/a (all cr < 0); and the 
overall settling time t 3 is the largest of these. 

In the improved theory, concern for settling time is dealt with by adding some function of t 3 to J. I have 
used t 3 /ct , where c t is the time that yields the same level of pain used in the state and control weightings. 
However, a case could be made for using the square of this instead, or perhaps the sum of such terms for 
each eigenvalue. In any case, the added term doesn’t depend on the noise, only the choice of L. 

4 White, Colored, &; Butterworth Noise 

In this section I’ll begin the analysis of the noise effect integrals treating those cases where S^lj) 
doesn’t vanish above some finite to. The simplest of these is “white” noise, for which S(co) = S. Note 
that, by this definition, white noise can’t have a bias, as this would imply an infinite spike at co = 0. Some 
readers may have heard me fulminate against this stuff before; here I’ll confine my antipathy to pointing 
out that any such process would have to contain infinite power, for which our universe lacks the resources. 
Still, the assumption that all noises are white has led to the enormous practical simplifications of LQG 
and Kalman theory, to where white noise has acquired a sort of mystical reality. It’s my hope that papers 
such as this will convince readers that the promise of better performance outweighs mathematical and 
numerical simplicity. 

Enough fulmination. An involved argument based on an eigensystem decomposition of Z leads to a set 
of scalar arctangent integrals. Reconstruction then yields 


*= -(7t/2)SI (21) 

where I is the n x n identity matrix. Observe that this result appears to be independent of Z, a property 
not possessed by any other S(u>) I’ve looked at. This is the root cause of the simplifications of LQG 
and Kalman theory. Not quite independent — the analysis depends critically on Z < 0. For the reader 
interested in verifying this result, caution: of the half dozen or so references on my shelf listing arctangent 
expansions, none were completely correct. 

Next, colored noise. Some in the field regard any non-white noise as colored; but most accept the 
definition of a colored noise u(t) as obeying 


where w(t) is white, and to c 
shown to be 


ii(t) = u? c [w(t) - u(t)] (22) 

is the “break” frequency. The power spectrum of such a process may be 
2R(0)co c 


S(o;) = 


7r(w 2 + w2) 


(23) 
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where R( 0) is the average power. Physically, u(t) is the result of passing white noise through a 1st order 
linear filter, whose break frequency is u c . I’ve never seen such a spectrum, and I doubt that you have; its 
utility comes from a well known technique, in which (22) is appended to the plant equations, when the 
white noise source is included in either the process or measurement noises, as needed. As the new ideas 
don’t require this artifice, I’ll not discuss it further. Properties of colored noise are that S(u> c ) = 5'(0)/2, 
and that half of the total power is in the “tail”, i.e., in the region uj c < u < oo. 

If (23) is substituted into (19), a sort of partial fractions expansion causes the white noise integral above 
to surface, leading to 

* = fl(0)(Z-w c I) _1 (24) 

So long as Z < 0, the matrix on the right is non-singular, and this formula is a big improvement over 
infinite numerical integration. 

Since real noises generally roll off much faster than colored noise, I have introduced a generalization I’ve 
called “Butterworth” noise. It is the result of passing white noise through an m pole low pass Butterworth 
filter. The power spectrum of such a process may be shown to be: 



Since (25) reduces to (23) for m = 1, colored noise might be referred to as 1 pole Butterworth noise. 
The property S(u> c ) = S(0)/2 continues to hold for all m; but the fraction of the total power in the 
tail drops rapidly with increasing m; e.g., 0.21945 for m = 2, and .098931 for m = 4. As a practical 
matter, instruments troubled by broad band noise frequently have Butterworth circuits added prior to 
digitization, to avoid “aliasing 55 . The resulting spectrum tends to look rather like (25), with lj c chosen 
well below the sampling frequency. If this sounds like your situation, then m = 4 is what you are most 
likely to encounter, as it has a straightforward implementation by a circuit comprising 2 operational 
amplifiers. 


If (25) is substituted into (19), the same technique used for colored noise works, yielding an analytic 
solution good for all m: 


= w c 2m s(^)fl( o)K 2m i + (-irz 2ni ] 


- 1 


^(-l) J w c - 2j Z 2j ‘esc 
J= 1 


7r(2m — 2j 4- 1) 


2m 


-™i 

UJ C 


(26) 


It’s not hard to show that this reduces to the colored noise effect integral (24) for m = 1. As for more 
poles, I’ll tabulate the next few: 

$2 = R(Q) fz 3 - - y/2w 3 lj (Z 4 +^I) _1 (27) 


* 3 = ^fl(O) (2Z 5 - u, 2 Z 3 + 2^Z + 3 o> c 5 I) (Z 6 - a/®l) “ 1 

- ft(0) (z 2 -2u; c Z-f (Z-wJ)" 1 (Z 2 -w c Z + w 2 l) _1 (28) 

$ 4 - R( 0) [Z 7 - Jfc,w 2 Z 5 + fciw 4 Z 3 - - k 2 u> 7 c l] (Z 8 + w 8 I) _1 (29) 

where 

ki = y/2 - 1 = 0.4142135624 ; fc 2 = 2^2 - \/2 = 1.5307337295 

Additional results for m < 8 will appear in [3]. 
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5 LQG Theory 

While I have railed against white noise above, it is the foundation of the popular Linear-Quadratic- 
Gaussian method of designing some kinds of control systems. For a regulator, the assumptions are that 
all the noise is white and has a Gaussian probability density, and that we wish to choose the feedback gain 
L to minimize the performance index (20). I’ll show how the main results of LQG theory for regulators 
may be derived from the new theory. As will be seen, no use is made of the Gaussian assumption, showing 
that it’s irrelevant in this context. 

To begin, if the noise is all white, then Q(c«;) is independent of uj in (15); so this must also be true of 
N(u;). Thus, (14) is reduced to a pair of terms involving the white noise effect integral (21), from which 
P x = — 7rN; and on substituting this back into (15) we have: 

G = ZP x + P X Z T -f- 7rQ = 0 (30 ) 

This set of constraints must be enforced while minimizing J. To do this I’ll introduce the variational 
Hamiltonian 

H{ P x , L, A) = Tr [0P X + AC] (31) 

where A is a symmetrical matrix of Lagrange multipliers, and the last term is really the sum over the 
direct product of A and C. Also 

0 = X + L t UL (32) 

and, from (13), the x term has disappeared, because white noise by definition has no bias. 

When formulated in this way, the necessary conditions for a minimum are that 'H{ P x , L, A) be stationary, 
relative to variations in P x and L. It’s not hard to show that the 1st set of conditions leads to another 
Lyapunov relation 

0 L AZ -j- Z r A = 0 (33) 

from which it may be shown that A > 0, and is thus non-singular. The other necessary condition leads 
to 

L = U _1 G t A (34) 

and we see why it was important to make U non-singular. It only remains to expand 0 and Z in (33), 
and eliminate L with (34). After cancelling terms, we are left with 

AGU _1 G r A = AF 4- F r A + X (35) 

This is the central result in LQG theory for regulators. After solving this matrix Riccati equation for A, 
(34) yields the optimal L; and P x may be obtained by solving the Lyapunov equation (30). If desired, 
J may then be found from (20). In other treatments I’ve seen of this problem, A is introduced by quite 
different routes. Its interpretation as a matrix of Lagrange multipliers is, I think, new. 

6 Observer Performance 

In contrast to the regulator, the inclusion of the known biases in the observer (7) and (8), and the 
measurement model (6), mean that the estimate error e(t) is free of bias, as may be seen from (11). The 
procedure for determining the covariance P € of e(t) follows the same plan as that of P x in the regulator, 
leading to the same result (14), with P e replacing P x . Again, N(cj) obeys a Lyapunov equation: 

ZN(u>) + N(u/)Z T = KVS t ,(u)V T K T + YSJwJY 7 ’ ee Q( w ) (36) 

There is an important difference from regulator theory — this time Q (u>) depends on the feedback gain 
K but the direct evaluation of P e is pretty much the same. The improved procedure (17) also works 
here, with P € replacing P x ; but this time 

V W 

M = £ *j [KV]j[KV][ + ]T * k Y k Yl (37) 

j=l k = 1 
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and as before, [KV], is the jth column of KV, and Y k is the kth column of Y. Also, is again the 
noise effect integral (19); while is similar, but depending on the measurement noise spectrum S vk k ( <*>)• 

If observers measured performance in the same way as regulators, then from (20), and the discussion 
following it, we would measure observer performance by: 

J = Tr [SP € ] + J t ( 38 ) 

where 3 is a symmetric weighting matrix, expressing our concern for the errors e(t); and J t penalizes 
observer settling time, and is constructed from the eigenvalues of the observer system matrix Z. A 
straightforward way to get 3 is Bryson weighting, as discussed in Section 3. In contrast with (20), the 
bias term is missing because e(t) is unbiased; and the control weighting term is missing, because P f doesn t 
depend on u(t), and control usage isn’t a concern of the observer designer. This is the performance index 
employed in the new theory; but it’s not seen today, largely because Kalman theory (see next section) is 
based on a different, and quite inferior idea. 

7 Kalman Theory 

Like LQG theory, Kalman theory for observers can be deduced directly from the more general results in 
the last section. A “Kalman filter” is an observer with the structure given in Section 2, but burdened 
with 2 rather unfortunate assumptions. One is that all the process and measurement noises are white 
and Gaussian, which I have already excoriated in Section 4. The other is that observer performance 
be measured by the residuals (8), rather than the estimation errors e(t). Penalizing residuals has some 
statistical justification, but fails to consider what designers want to achieve. 

In present practice, almost every observer has been constructed from some extension of Kalman theory. 
Today’s practical filters have been built from a set of improvements introduced by very competent people, 
many of whom I have known and respect. However, nearly all of them are essentially applied mathemati- 
cians, more concerned with rigor than physical reality and the needs of the designer. Rigor is fine; but it 
ain’t everything. 

I’ll begin with the performance index. This is tricky, because, in Kalman theory, the residuals are weighted 
by the inverse covariance of the measurement errors, which for white noise is zero. This is usually side 
stepped by some flummery involving a Dirac delta function, eventually leading to a performance index of 
the form (38), but without the J t term. However, by starting from (10), we can prove without flummery 
that the Kalman assumption leads to 

J = Trjl^r^HPj =Tr[0P e ] (39) 

where 

r = rr(VS„V T + WS 1U W 7 ') (40) 

Unlike LQG theory, this © doesn’t depend on the gains. The 1st term in T corresponds to what’s usually 
seen in Kalman theory, but the latter comes from including the process noise in the measurement model, 
a modest generalization. It should be clear that T will be non-singular, provided some noise contaminates 
every measurement. 

The next step follows LQG theory. If all the noises are white, then P c = -ttN, an< 3 Lyapunov relation 
becomes 

C = ZP C + P C Z T + ttQ = 0 (41) 

We again need to minimize J, relative to K, and subject to the constraints (41). As in Section 5, we may 
use a variational Hamiltonian: 

H{ Pe, K, A) - Tr [0Pe + AC] (42) 

and the necessary conditions for a minimum are that it’s stationary with respect to variations in P e and 

K. 

For the 1st set of conditions, the dependence of 7i( P e , K, A) on P c is the same as the earlier ?i(P X i L, A) 
on P x ; so we are again led to (33). This time, the relation only serves to establish that A > 0, and is 
therefore non-singular. 
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In working out the 2nd set of necessary conditions, observe that, unlike LQG theory, 0 doesn’t depend 
on L, but Q does. On expanding Z and Y with (12), it can be rewritten as 


ttQ = KrK T - Ktf r - ¥K t + A (43) 

where 

A = ttBS^B 7, ; * = 7rBS,„W T (44) 

Differentiation of the Hamiltonian with respect to K is now possible, eventually leading to 

K= (P e H r + 'J')r - 1 (45) 

I’U note that the non-singularity of A and T are both needed in proving (45). Actually, we have already 
seen that T can be singular only if some measurement is uncontaminated. If there is no noise, then 
® ar, d from (11), e(f) ► 0; so that P e = 0, when any K could be chosen, so long as Z < 0. You 
can work out intermediate cases yourself. 


When (45) is substituted back into (41), an algebraic Riccati equation in P c emerges: 

P € ©P e + (*r 'H - F)P C + P e (H T r l * T - F r ) + W 1 * 7, — A = 0 (46) 

As the only unknown here is P e , it may be solved for numerically, when (45) immediately yields the 
optimal (choke) K. This looks pretty complicated; but if process noise wasn’t included in the measurement 
model, then ^ = 0, and (46) and (45) reduce to well known Kalman results. 


As a final note, I’ll point out that no use was made of the usual assumption that the noises are Gaussian; 
so that assumption is unnecessary. That it was required in Kalman theory may be traced to the need to 
equate minimum error covariance to the notion of achieving the maximum likelihood that you’ve got it 
right, a statistical finesse not essential to the theory. 

8 Bounded Polynomial Noises 

It s often true that measurement noise can be studied in the laboratory, and accurate power spectra 
determined. Unexpected bumps in the spectrum may then be used to uncover problems that can be 
alleviated by design improvements. By contrast, process noises are hard to measure; and even if known, 
have little application in current design practice. Since the new theory demands this information, what 
do we do if we can’t get it? Well, as a general rule, the better our information, the better our ultimate 
performance should be. If our information on some spectrum is poor, any existing measurements should 
be combined with physical reasoning to estimate the average power and shape of the spectrum. 

If the estimated spectrum shape is analytically simple, it may be possible to evaluate the noise effect 
integral (19), for a given Z, without direct numerical integration. This has already been done for several 
spectra in Section 4. Here, the general class of shapes characterized by bounded polynomials is examined; 
and a few are completely worked out, along with a general procedure for extending the list. 

The general problem is solvable provided we can evaluate the class of integrals defined by: 


F(Z, k, 0 = f (Z + w^Z~ 1 )~ l u> k d(jLj 
Jo 

It can be shown that these integrals are all given by: 


(47) 


F(Z, k } t) = (— l) fc / 2 Z fc 
F(Z, k, t) = (_1)(^U/2 Z * 


k/2 

tan -1 (tZ _1 ) + V - — l -L t V-i Z i-2j 


(*- l )/2 


£ ln(I + t 2 Z~ 2 ) + V' LJZ t 2 j z - 2 j 
2 7 =i 2j 


(k even) 


(k odd) 


(48) 


(49) 
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For these formulas to be helpful, it’s necessary to have a clear understanding of what’s meant by the 
arctangent and the logarithm of a matrix. With considerable care about the regions of convergence, these 
matrix functions may be defined by power series generalized from known scalar series; although I must 
again caution the reader that all the standard references f’ve seen get at least the arctangent wrong. 
Anyway, it may be shown that, with these functions so defined, they may be evaluated by these relations: 

tan _1 (A) = / (A“ l + x 2 A) _1 dx (50) 

Jo 


ln(I + A)= /^(A -1 + yl)~ 1 dy (51) 

Jo 

In both cases, since the integration interval is fixed, evaluation by the Gaussian technique will yield any 
required accuracy, without much calculation. Both these formulas run into numerical trouble for large 
A; methods for modifying them to remove the difficulties will be given in [3]. 


For the simplest application of this machinery, consider the flat bounded spectrum; i.e., white noise that 
has somehow been cut off sharply. For this, S (a?) = R(0)/uj c for 0 < w < u> c , and zero otherwise. On 
applying the above relations we find 


(52) 


While there is no practical way to generate a process with this spectrum, it may be considered as the 
limit of Butterworth noise, with the same f?(0) and u> c , as m -* oo. I’m not sure how to prove this; but 
I’ve tested it numerically at m = 8, with good agreement. 


The next order of complexity is the linear spectrum. It has a peak value at u> = 0, drops linearly to zero, 
and terminates. More precisely, 

S(w) = (l — (0 < w < 2 u) c ) (53) 

and zero otherwise. On applying the above theory, the corresponding noise effect integral becomes 


$ 


m 


tan 


u> c 


1 (2w c Z- 1 ) - — Z\n(l + 4u?Z 
4uv 


2ry-2\ 


(54) 


Onward. The cubic power spectrum is initially flat, then falls off according to a cubic polynomial, 
flattening again and terminating when it reaches zero. The spectrum may be shown to be. 




fl( 0) 




for 0 < u> < 2u c and zero otherwise. This looks superficially like colored noise; but only 3/16 of the 
power is in the tail, compared to half for colored noise; and the frequency within which half of the total 
power is found is 0.53277 u> c , compared to u> c for colored noise. This time the noise effect integral turns 


out to be 


$ = 


«( 0 ) 


!( I+ 4 z2 ) 


tan 


1 (2u7 c Z _ 1 ) - _Z 3 ln(l + 4^Z- 2 ) - -Z 


U>c 


(56) 


This spectrum was employed in [2] to describe satellite drag variations, for which very little flight data 
exists. However, most of the numerical work was based on a more or less equivalent colored noise 
spectrum, as the noise effect integral theory had not yet been implemented. 


All these spectra (not including white noise) might be called 2 parameter spectra, as they are completely 
prescribed by R( 0) and u; c . There are other possibilities for 2 parameter spectra, and a considerable 
range of choices for 3 parameters, none of which have been looked at. Moreover, I believe that most 
spectra we’re likely to encounter could be reasonably approximated by some combination of bounded 
polynomials. Further afield, there are several theoretical spectra, such as that for thermal noise, for 
which we might be able to calculate ^ analytically. 
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9 Low &; High Frequency Noises 

For theoretical purposes, it’s interesting to see what happens if a particular noise spectrum S^cj) is 
concentrated in a band well below the system dynamics; i.e., u; c is much closer to the imaginary axis than 
any of the eigenvalues of Z. To do this, we can let c o c — ► 0 in each of the above noise effect integrals, while 
holding R( 0) fixed. Except for white noise, where the idea is meaningless, the results for all spectra are 

lim *= R(0)Z~ l (57) 

We may conclude that the shape of the spectrum doesn’t much matter, if the bulk of the power is well 
below the system dynamics. This also serves as a valuable check on the formulas for each <I>. 

In the converse situation, where to c is well above the system dynamics, i.e., where uj c is much further 
from the imaginary axis than any eigenvalue of Z, we get a rather different result. This time, since we 
expect only the low frequency power to have much effect, we hold 5(0) constant, while letting u/ c — oo, 
rather than fixing (0) . This time, for every spectrum above, 

H m $ = —^5(0)1 (58) 

As Z doesn’t appear in the result, we now find that everything looks like white noise, and the dynamics 
make little difference, if they are slow compared to cj c . And we have another valuable check on the 
formulas. 

10 What’s Next? 

The new approach to optima] estimation and control, advanced in this paper, is barely a beginning. If 
the history of the development of LQG and Kalman theory is any guide, it will be several years before 
the theory will be developed to the point where it sees regular use in design, and begins to enter the 
engineering curriculum. After the next few months, my crystal ball gets pretty murky; but here is my 
vision, for what it’s worth. 

To begin, I plan to be able to fill orders for [3] before the end of this year. At around 200 pages, it will 
greatly amplify on the present paper, including material on noise statistics, power spectra, and matrix 
manipulation that s difficult (occasionally impossible) to find elsewhere. Also planned for that book are 
an extension of the present theory to cover the practical situation where an observer is used as the source 
of information for a regulator; so that both sets of optimal feedback gains need to be found. Further 
additions should include a beginning in understanding the transient behavior of the state and estimation 
error covariances; and several examples of the application of the theory, showing the improvements that 
may be expected relative to LQG or Kalman theory. 

Further afield, I see the next major extension is in the area of sampled and quantized measurements, 
and discrete updates both in controls and observation. The present theory might be regarded as the 
oversampling limit of a fully digital implementation, with unlimited computational resources. This causes 
3 new issues to surface. 1st, better performance costs money — terms could be added to our performance 
index penalizing increased digital precision, and more rapid sampling and updates. 2nd, even in the 
absence of noise, an exact model of the measurements is no longer possible. That is, all the available 
information (the complete set of past measurements and current and prior state estimates) is insufficient 
for an exact reconstruction of the current measurement. This is true whether “sample” means a true 
point measurement, or an average over the sampling interval. 3rd, settling times may be affected by these 
digital details. These issues have all been examined in the context of current practice, but will need to 
be revisited within the new performance philosophy. 

Another matter of great practical importance will be to deal with non— stationary systems; i.e., those in 
which the plant and measurement parameters, and the noise properties, may vary with time. In present 
practice, such problems are treated by something amounting to a continuous integration of a matrix 
Riccati equation, causing the covariances and feedback gains to evolve in time. Unfortunately, at this 
writing, I have no clear view of how these methods might be generalized to encompass arbitrary noise 
power spectra. Indeed, even the notion of a quasi-stationary spectrum will need a careful definition. 
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There are several other obvious shortcomings. All the noise and measurement biases are here assumed 
known and invariant. The techniques of bias estimation and integral control are well known; and it should 
be possible to bring them into the new theory, without much difficulty. Another issue is robustness; i.e., 
how to deal with errors in the knowledge of the system parameters. I expect that this will require lots of 
work. 

Another issue swept under the rug at the beginning of the paper was linearization — where did all those 
fixed matrices come from? While this has long given us pain, and matters are far from settled in current 
practice, I doubt that the new theory will be any worse in this respect. And then there are your insights. 
Overall, I welcome anyone who wants to contribute to this newborn field. Talk to me. 
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